Density-based Clustering

Density-based methods group data points together based on density instead of distance.

How does it work

  1. define two hyperparameters:
    • a distance parameter ϵ = the maximum distance
    • a quantity parameter n = the minimum number of examples to put in a cluster
  2. Pick an example x from your dataset at random and assign it to cluster 1, then count how many examples have the distance from x less than or equal to ϵ.
    • If this quantity is greater than or equal to n, then put all these ϵ-neighbors to the same cluster 1.
    • Examine each member of cluster 1 and find their respective ϵ-neighbors. If some member of cluster 1 has n or more ϵ-neighbors, expand cluster 1 by adding those ϵ-neighbors to the cluster.
  3. Continue expanding cluster 1 until there are no more examples to put in it.
    • Pick from the dataset another example not belonging to any cluster and put it to cluster 2. You continue like this until all examples either belong to some cluster or are marked as outliers. An outlier is an example whose ϵ\epsilon-neighborhood contains less than nn examples.”

Examples